Traditional cloud storage platforms function as passive data repositories, lacking the capability to semantically understand stored document contents and forcing users into laborious manual retrieval workflows. Concurrently, sophisticated Layer-7 Distributed Denial-of-Service (DDoS) attacks targeting API endpoints threaten both service availability and operational economics, particularly for AI-enhanced platforms where every inference request incurs computational cost. This paper presents CloudDrive-AI, a comprehensive intelligent cloud storage framework integrating three synergistic technologies: (1) a federated multi-cluster machine learning architecture providing real-time DDoS detection via Isolation Forest anomaly scoring with 2-of3 majority-vote aggregation, eliminating single points of failure inherent in centralised security systems; (2) a robust OCR pipeline built on Tesseract V5 Long Short-Term Memory (LSTM) networks for extracting text from scanned documents and images; and (3) a context-based question answering framework that sends extracted OCR text to the Google Gemini API to deliver grounded, hallucination-free responses over user-uploaded documents. The federated security layer demonstrates effective detection of upload-based DDoS attacks, as validated through a controlled attack simulator. The OCR pipeline attains practical accuracy on both clean scans and smartphone photographs. A controlled attack simulation confirms that the system successfully blocks malicious traffic while allowing legitimate user requests. CloudDrive-AI transforms passive cloud repositories into active, intelligent, and secure document management ecosystems.
Introduction
The text presents CloudDrive-AI, a cloud storage system designed to overcome limitations in traditional cloud platforms while addressing security threats and improving document intelligence.
Modern cloud services like AWS S3, Google Drive, and OneDrive mainly act as passive storage systems with weak search capabilities limited to file metadata. This makes it difficult to find specific information inside large unstructured files such as PDFs, scanned documents, or images. At the same time, cloud platforms face growing security risks from advanced Layer-7 DDoS attacks that exploit expensive AI-based operations like OCR and LLM inference. Additionally, LLM-based systems can produce hallucinated (incorrect) answers if not properly grounded in real data.
To solve these issues, CloudDrive-AI introduces three main components:
Federated multi-cluster DDoS defense using Isolation Forest models with majority voting to improve security and avoid single points of failure.
OCR pipeline using Tesseract V5 to extract text from unstructured documents like scans and images.
Grounded question answering system using the Google Gemini API, which answers strictly based on extracted document content to prevent hallucinations.
The literature review highlights progress in OCR (especially LSTM-based Tesseract improvements), anomaly detection using Isolation Forest for DDoS defense, and context-based LLM systems for document question answering. However, existing systems are mostly isolated and do not combine security, OCR, and AI reasoning in one unified framework.
Conclusion
This paper presented CloudDrive-AI, an intelligent cloud storage framework addressing three interconnected limitations of contemporary platforms: semantic blindness to document contents, vulnerability to economically-motivated Layer7 DDoS attacks, and the hallucination risk of naive LLM integration.
Three empirically validated architectural contributions realise this vision. The federated Isolation Forest DDoS defense demonstrates effective detection of upload-based attacks, as validated through a controlled attack simulator, with Byzantine fault tolerance that sustains protection under targeted node failure. The Tesseract V5 OCR pipeline provides practical text extraction from both embedded-text PDFs and scanned documents through a conditional processing workflow that converts heterogeneous real-world documents into searchable text. The context-based Gemini integration eliminates hallucination by grounding generation strictly to user-provided OCR text, as confirmed by the absence of unverifiable claims during testing.
The integrated system maintains sub-second response times under moderate concurrent load on commodity cloud infrastructure, confirming practical deployability. CloudDrive-AI’s modular, open architecture supports fully self-hosted deployment with complete data sovereignty, addressing compliance requirements that prevent many organisations from adopting closed-source AI-enhanced storage services. This work provides a concrete reference implementation and empirical baseline for the next generation of intelligent, secure cloud storage infrastructure—where data is not merely preserved but actively understood and made accessible through natural, trustworthy conversation.
References
[1] J. Gantz and D. Reinsel, “The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East,” IDC iView, 2012.
[2] M. R. Palankar, A. Iamnitchi, M. Ripeanu, and S. Garfinkel, “Amazon S3 for Science Grids: A Viable Solution?” in Proc. DADC ’08, 2008, pp. 55–64.
[3] S. Nath, A. R. Rupanagudi, and J. M. Mathew, “A Survey on Document Retrieval Over Cloud Environment,” in Proc. IEEE ICICA, 2014, pp. 156–160.
[4] M. H. Sqalli, F. Al-Haidari, and K. Salah, “EDoS-Shield — A TwoSteps Mitigation Technique against EDoS Attacks in Cloud Computing,” in Proc. IEEE UCC, 2011, pp. 49–56.
[5] T. Sommestad, “Intrusion Detection Methods and Systems,” Information Security Technical Report, vol. 17, no. 1-2, pp. 1–11, 2012.
[6] A. Vaswani et al., “Attention is All You Need,” in Proc. NeurIPS, vol. 30, 2017, pp. 5998–6008.
[7] Gemini Team, Google, “Gemini: A Family of Highly Capable Multimodal Models,” Google Technical Report, 2023.
[8] Z. Ji et al., “Survey of Hallucination in Natural Language Generation,” ACM Computing Surveys, vol. 55, no. 12, Art. 248, 2023.
[9] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge- Intensive NLP Tasks,” in Proc. NeurIPS, vol. 33, 2020, pp. 9459–9474. [10] S. Mori, H. Nishida, and H. Yamada, Optical Character Recognition. New York: Wiley, 2002.
[10] R. Smith, “An Overview of the Tesseract OCR Engine,” in Proc. ICDAR, IEEE, 2007, pp. 629–633.
[11] T. Hegghammer, “OCR with Tesseract, Amazon Textract, and Google Document AI: A Benchmarking Experiment,” Journal of Computational Social Science, vol. 5, pp. 861–882, 2022.
[12] F. T. Liu, K. M. Ting, and Z. Zhou, “Isolation Forest,” in Proc. IEEE ICDM, 2008, pp. 413–422.
[13] C. Carpineto and G. Romano, “A Survey of Automatic Query Expansion in Information Retrieval,” ACM Computing Surveys, vol. 44, no. 1, Art. 1, 2012.